Búsqueda | Portal Regional de la BVS

1.

Novel Scoring Scale for Quality Assessment of Lung Ultrasound in the Emergency Department.

Balderston, Jessica R; Brittan, Taylor; Kimura, Bruce J; Wang, Chen; Tozer, Jordan.

West J Emerg Med ; 25(2): 264-267, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38596928

RESUMEN

Introduction: The use of a reliable scoring system for quality assessment (QA) is imperative to limit inconsistencies in measuring ultrasound acquisition skills. The current grading scale used for QA endorsed by the American College of Emergency Physicians (ACEP) is non-specific, applies irrespective of the type of study performed, and has not been rigorously validated. Our goal in this study was to determine whether a succinct, organ-specific grading scale designed for lung-specific QA would be more precise with better interobserver agreement. Methods: This was a prospective validation study of an objective QA scale for lung ultrasound (LUS) in the emergency department. We identified the first 100 LUS performed in normal clinical practice in the year 2020. Four reviewers at an urban academic center who were either emergency ultrasound fellowship-trained or current fellows with at least six months of QA experience scored each study, resulting in a total of 400. The primary outcome was the level of agreement between the reviewers. Our secondary outcome was the variability of the scores given to the studies. For the agreement between reviewers, we computed the intraclass correlation coefficient (ICC) based on a two-way random-effect model with a single rater for each grading scale. We generated 10,000 bootstrapped ICCs to construct 95% confidence intervals (CI) for both grading systems. A two-sided one-sample t-test was used to determine whether there were differences in the bootstrapped ICCs between the two grading systems. Results: The ICC between reviewers was 0.552 (95% CI 0.40-0.68) for the ACEP grading scale and 0.703 (95% CI 0.59-0.79) for the novel grading scale (P < 0.001), indicating significantly more interobserver agreement using the novel scale compared to the ACEP scale. The variance of scores was similar (0.93 and 0.92 for the novel and ACEP scales, respectively). Conclusion: We found an increased interobserver agreement between reviewers when using the novel, organ-specific scale when compared with the ACEP grading scale. Increased consistency in feedback based on objective criteria directed to the specific, targeted organ provides an opportunity to enhance learner education and satisfaction with their ultrasound education.

Asunto(s)

Servicio de Urgencia en Hospital , Pulmón , Humanos , Pulmón/diagnóstico por imagen , Estudios Prospectivos , Ultrasonografía , Escolaridad , Variaciones Dependientes del Observador , Reproducibilidad de los Resultados

2.

Interrater Variability among Anaesthesiologists Using American Society of Anesthesiologists Physical Status Classification System.

Sharma Bhattarai, Amit; Bista, Navindra Raj; Basnet, Madindra Bahadur; Joshi, Deepak Raj; Shrestha, Anil.

J Nepal Health Res Counc ; 21(4): 543-549, 2024 Mar 31.

Artículo en Inglés | MEDLINE | ID: mdl-38616581

RESUMEN

BACKGROUND: The American Society of Anaesthesiologists Physical Status classification is deployed by the anaesthesiologists worldwide to classify operative surgical patients. Many studies have found moderate degree of interrater variability among anaesthesiologists. The general objective of the study was to find out interrater variability among Nepalese anesthesiologists using this classification system in Nepal. The specific objectives of the study were to find out the correctness of assignment and inter-rater variability among anaesthesiologists based on their experience. METHODS: Ten clinical cases were distributed among 130 registered anaesthesiologist practitioners of Nepal after validation with the experts. Respondents were asked to assign each of ten cases to a specific physical status class. Anaesthesiologists were classified to two classes based on clinical experience as having more or less than five years of experience. RESULTS: We found substantial agreement among < 5 year's (0.66) and > 5 year's experience group (0.753) and among all raters (0.736). The mean score of the group with less than 5 years of experience was more. There was no significant difference between the mean score (p = 0.595). Overall mean score for the both groups was 5.66 with SD 1.66. There was no significant difference between the groups. CONCLUSIONS: The study shows that there is very less variation among registered practising anaesthesiologists of Nepal using American Society of Anesthesiologists Physical Status classification system.

Asunto(s)

Anestesiólogos , Variaciones Dependientes del Observador , Examen Físico , Humanos , Nepal , Personas del Sur de Asia , Examen Físico/clasificación

3.

The influence of listener experience, measurement scale and speech task on the reliability of auditory-perceptual evaluation of vocal quality.

Alves, Jônatas do Nascimento; Almeida, Anna Alice Figueiredo de; Yamasaki, Rosiane; Lopes, Leonardo Wanderley.

Codas ; 36(3): e20230175, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38629682

RESUMEN

PURPOSE: To assess the influence of the listener experience, measurement scales and the type of speech task on the auditory-perceptual evaluation of the overall severity (OS) of voice deviation and the predominant type of voice (rough, breathy or strain). METHODS: 22 listeners, divided into four groups participated in the study: speech-language pathologist specialized in voice (SLP-V), SLP non specialized in voice (SLP-NV), graduate students with auditory-perceptual analysis training (GS-T), and graduate students without auditory-perceptual analysis training (GS-U). The subjects rated the OS of voice deviation and the predominant type of voice of 44 voices by visual analog scale (VAS) and the numerical scale (score "G" from GRBAS), corresponding to six speech tasks such as sustained vowel /a/ and /É/, sentences, number counting, running speech, and all five previous tasks together. RESULTS: Sentences obtained the best interrater reliability in each group, using both VAS and GRBAS. SLP-NV group demonstrated the best interrater reliability in OS judgment in different speech tasks using VAS or GRBAS. Sustained vowel (/a/ and /É/) and running speech obtained the best interrater reliability among the groups of listeners in judging the predominant vocal quality. GS-T group got the best result of interrater reliability in judging the predominant vocal quality. CONCLUSION: The time of experience in the auditory-perceptual judgment of the voice, the type of training to which they were submitted, and the type of speech task influence the reliability of the auditory-perceptual evaluation of vocal quality.

Asunto(s)

Disfonía , Percepción del Habla , Humanos , Habla , Reproducibilidad de los Resultados , Medición de la Producción del Habla , Variaciones Dependientes del Observador , Calidad de la Voz , Acústica del Lenguaje

4.

Unravelling variations: an examination of entry point selection in proximal femoral cephalomedullary nailing.

Lisitano, Leonard; Wulff, Laura; Schmidt, Jürgen; Sieland, Christoph; Mahlke, Lutz; Röttinger, Timon; Cifuentes, Jairo; Mayr, Edgar; Rau, Kim.

J Orthop Traumatol ; 25(1): 23, 2024 Apr 23.

Artículo en Inglés | MEDLINE | ID: mdl-38653863

RESUMEN

BACKGROUND: The exact positioning of the cephalomedullary (CM) nail entry point for managing femoral fractures remains debatable, with significant implications for fracture reduction and postoperative complications. This study aimed to explore the variability in the selection of the entry point among trauma surgeons, hypothesizing potential differences and their association with surgeon experience. METHODS: In this prospective multicenter study, 16 participants, ranging from residents to senior specialists, partook in a simulation wherein they determined the optimal entry point for the implantation of a proximal femoral nail antirotation (PFN-A; DePuy Synthes) in various femora. The inter- and intra-observer variability was calculated, along with comprehensive descriptive statistical analysis, to assess the variability in entry point selection and the impact of surgeon experience. RESULTS: In this study, the mean distance from the selected entry points to the calculated mean entry point was 3.98 mm, with a smaller distance observed among surgeons with more than 500 implantations (ANOVA, p = 0.050). Intra-surgeon variability for identical femora averaged at 5.14 mm, showing no significant differences across various levels of surgical experience or training. Notably, 13.6% of selected entry points would not allow a proper intramedullary positioning of the implant, thereby rendering anatomical repositioning unfeasible. Among these impossible entry points, a significant skew towards anterior placement was observed (70.6% of the impossible entry points), with a smaller fraction being overly lateral (27.5%) or medial (13.7%). On a patient level, the impossibility rate varied widely from 0 to 35% among the different femora examined, with a significantly higher rate seen in younger patients (mean age 55.02 versus 60.32; t-test for independent samples, p = 0.04). CONCLUSIONS: Significant variations exist in surgeons' selection of entry points for proximal femoral nailing, underscoring the task's complexity. Experience does not prevent the choice of unfeasible entry points, emphasizing the inadequacy of a universal approach and pointing towards the necessity for a patient-specific strategy for improved outcomes. TRIAL REGISTRATION NUMBER: DRKS00032465.

Asunto(s)

Clavos Ortopédicos , Fracturas del Fémur , Fijación Intramedular de Fracturas , Humanos , Fijación Intramedular de Fracturas/métodos , Fijación Intramedular de Fracturas/instrumentación , Estudios Prospectivos , Fracturas del Fémur/cirugía , Competencia Clínica , Variaciones Dependientes del Observador , Femenino , Masculino

5.

Interrater reliability of the violence risk assessment checklist for youth: a case vignette study.

Laake, Anniken L W; Roaldset, John Olav; Husum, Tonje Lossius; Bjørkly, Stål Kapstø; Gustavsen, Carina Chudiakow; Lockertsen, Øyvind.

BMC Psychiatry ; 24(1): 303, 2024 Apr 23.

Artículo en Inglés | MEDLINE | ID: mdl-38654194

RESUMEN

BACKGROUND: Facilities providing health- and social services for youth are commonly faced with the need for assessment and management of violent behavior. These providers often experience shortage of resources, compromising the feasibility of conducting comprehensive violence risk assessments. The Violence Risk Assessment Checklist for Youth aged 12-18 (V-RISK-Y) is a 12-item violence risk screening instrument developed to rapidly identify youth at high risk for violent behavior in situations requiring expedient evaluation of violence risk. The V-RISK-Y instrument was piloted in acute psychiatric units for youth, yielding positive results of predictive validity. The aim of the present study was to assess the interrater reliability of V-RISK-Y in child and adolescent psychiatric units and acute child protective services institutions. METHODS: A case vignette study design was utilized to assess interrater reliability of V-RISK-Y. Staff at youth facilities (N = 163) in Norway and Sweden scored V-RISK-Y for three vignettes, and interrater reliability was assessed with the intraclass correlation coefficient (ICC). RESULTS: Results indicate good interrater reliability for the sum score and Low-Moderate-High risk level appraisal across staff from the different facilities and professions. For single items, interrater reliability ranged from poor to excellent. CONCLUSIONS: This study is an important step in establishing the psychometric properties of V-RISK-Y. Findings support the structured professional judgment tradition the instrument is based on, with high agreement on the overall risk assessment. This study had a case vignette design, and the next step is to assess the reliability and validity of V-RISK-Y in naturalistic settings.

Asunto(s)

Lista de Verificación , Violencia , Humanos , Adolescente , Violencia/psicología , Medición de Riesgo/métodos , Niño , Reproducibilidad de los Resultados , Masculino , Femenino , Lista de Verificación/normas , Suecia , Variaciones Dependientes del Observador , Noruega , Servicios de Protección Infantil , Psicometría

6.

Inter- and intra-observer reliability and agreement of O2Pulse inflection during cardiopulmonary exercise testing: A comparison of subjective and novel objective methodology.

Nickolay, Thomas; McGregor, Gordon; Powell, Richard; Begg, Brian; Birkett, Stefan; Nichols, Simon; Ennis, Stuart; Banerjee, Prithwish; Shave, Rob; Metcalfe, James; Hoye, Angela; Ingle, Lee.

PLoS One ; 19(3): e0299486, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38452129

RESUMEN

Cardiopulmonary exercise testing (CPET) is the 'gold standard' method for evaluating functional capacity, with oxygen pulse (O2Pulse) inflections serving as a potential indicator of myocardial ischaemia. However, the reliability and agreement of identifying these inflections have not been thoroughly investigated. This study aimed to assess the inter- and intra-observer reliability and agreement of a subjective quantification method for identifying O2Pulse inflections during CPET, and to propose a more robust and objective novel algorithm as an alternative methodology. A retrospective analysis was conducted using baseline data from the HIIT or MISS UK trial. The O2Pulse curves were visually inspected by two independent examiners, and compared against an objective algorithm. Fleiss' Kappa was used to determine the reliability of agreement between the three groups of observations. The results showed almost perfect agreement between the algorithm and both examiners, with a Fleiss' Kappa statistic of 0.89. The algorithm also demonstrated excellent inter-rater reliability (ICC) when compared to both examiners (0.92-0.98). However, a significant level (P ≤0.05) of systematic bias was observed in Bland-Altman analysis for comparisons involving the novice examiner. In conclusion, this study provides evidence for the reliability of both subjective and novel objective methods for identifying inflections in O2Pulse during CPET. These findings suggest that further research into the clinical significance of O2Pulse inflections is warranted, and that the adoption of a novel objective means of quantification may be preferable to ensure equality of outcome for patients.

Asunto(s)

Prueba de Esfuerzo , Humanos , Prueba de Esfuerzo/métodos , Variaciones Dependientes del Observador , Reproducibilidad de los Resultados , Estudios Retrospectivos , Ensayos Clínicos como Asunto

7.

Evaluating inter-rater reliability in the context of "Sysmex UN2000 detection of protein/creatinine ratio and of renal tubular epithelial cells can be used for screening lupus nephritis": a statistical examination.

Li, Ming; Gao, Qian; Yang, Jing; Yu, Tianfei.

BMC Nephrol ; 25(1): 94, 2024 Mar 13.

Artículo en Inglés | MEDLINE | ID: mdl-38481181

RESUMEN

BACKGROUND: The evaluation of inter-rater reliability (IRR) is integral to research designs involving the assessment of observational ratings by two raters. However, existing literature is often heterogeneous in reporting statistical procedures and the evaluation of IRR, although such information can impact subsequent hypothesis testing analyses. METHODS: This paper evaluates a recent publication by Chen et al., featured in BMC Nephrology, aiming to introduce an alternative statistical approach to assessing IRR and discuss its statistical properties. The study underscores the crucial need for selecting appropriate Kappa statistics, emphasizing the accurate computation, interpretation, and reporting of commonly used IRR statistics between two raters. RESULTS: The Cohen's Kappa statistic is typically used for two raters dealing with two categories or for unordered categorical variables having three or more categories. On the other hand, when assessing the concordance between two raters for ordered categorical variables with three or more categories, the commonly employed measure is the weighted Kappa. CONCLUSION: Chen and colleagues might have underestimated the agreement between AU5800 and UN2000. Although the statistical approach adopted in Chen et al.'s research did not alter their findings, it is important to underscore the importance of researchers being discerning in their choice of statistical techniques to address their specific research inquiries.

Asunto(s)

Nefritis Lúpica , Humanos , Creatinina , Reproducibilidad de los Resultados , Nefritis Lúpica/diagnóstico , Variaciones Dependientes del Observador , Células Epiteliales

8.

A Novel Method for the Measurement of the Vaginal Wall Thickness by Transvaginal Ultrasound: A Study of Inter- and Intra-Observer Reliability.

Bosio, Sara; Barba, Marta; Vigna, Annalisa; Cola, Alice; De Vicari, Desirèe; Costa, Clarissa; Volontè, Silvia; Frigerio, Matteo.

Medicina (Kaunas) ; 60(3)2024 Feb 22.

Artículo en Inglés | MEDLINE | ID: mdl-38541095

RESUMEN

Background and Objectives: A consensus regarding the optimal sonographic technique for measuring vaginal wall thickness (VWT) is still absent in the literature. This study aims to validate a new method for measuring VWT using a biplanar transvaginal ultrasound probe and assess both its intra-operator and inter-operator reproducibility. Material and Methods: This prospective study included patients with genitourinary syndrome of menopause-related symptoms. Women were scanned using a BK Medical Flex Focus 400 with the 65 × 5.5 mm linear longitudinal transducer of an endovaginal biplanar probe (BK Medical probe 8848, BK Ultrasound, Peabody, MA, USA). Vaginal wall thickness (VWT) measurements were acquired from the anterior and posterior vaginal wall at three levels. Results: An inter-observer analysis revealed good consistency between operators at every anatomical site, and the intra-class coefficient ranged from 0.931 to 0.987, indicating high reliability. An intra-observer analysis demonstrated robust consistency in vaginal wall thickness measurements, with an intra-class coefficient exceeding 0.9 for all anatomical sites. Conclusions: The measurement of vaginal wall thickness performed by transvaginal biplanar ultrasound was easy and demonstrated good intra- and inter-operator reliability.

Asunto(s)

Vagina , Humanos , Femenino , Reproducibilidad de los Resultados , Estudios Prospectivos , Variaciones Dependientes del Observador , Ultrasonografía , Vagina/diagnóstico por imagen

9.

Development and Validation of the Bilingual Catalan/Spanish Cross-Cultural Adaptation of the Consensus Auditory-Perceptual Evaluation of Voice.

Calaf, Neus; Garcia-Quintana, David.

J Speech Lang Hear Res ; 67(4): 1072-1089, 2024 Apr 08.

Artículo en Inglés | MEDLINE | ID: mdl-38527275

RESUMEN

PURPOSE: This study aimed to develop a valid and reliable bilingual version of the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) for the auditory-perceptual evaluation of voice in Catalan and Spanish speakers. METHOD: The development of this CAPE-V adaptation included Delphi methodology with 20 voice and speech experts reaching consensus on the optimal adapted terminology of the perceptual vocal attributes, considering also input from the original instrument authors. The adaptation and validation of vocal tasks followed a sequential validation procedure, with input from phoneticians and speech-language pathologists. Following pilot testing with a large sample of speech-language pathology students, a refined adapted version was empirically tested for validity and reliability. Concurrent validity was assessed by comparing the adapted CAPE-V with the reference Grade, Roughness, Breathiness, Asthenia, Strain scale. Construct validity was assessed through convergent and discriminant validity analysis. Intrarater and interrater reliability were assessed via intraclass correlation coefficient calculations. User experience was evaluated through a questionnaire. Scale properties were validated using a confusion matrix, and cutoff values were calculated to achieve the optimal balance between sensitivity and specificity. RESULTS: Through a formalized consensus process, optimal Catalan/Spanish terminology was determined for the perceptual attributes of voice present in the CAPE-V. An adapted protocol of tasks was obtained that preserves the objectives of the original instrument and the relevance of the phonetic criteria in the target languages. The results demonstrated concurrent validity, construct validity, and intrarater reliability. Interrater reliability was found to depend on the extent to which evaluators shared their internal standards. The raters identified CAPE-V as an effective and preferred instrument. CONCLUSION: An adapted, validated version of the CAPE-V is made available to clinical professionals for the evaluation of voice in Catalan and Spanish speakers.

Asunto(s)

Disfonía , Humanos , Comparación Transcultural , Consenso , Reproducibilidad de los Resultados , Calidad de la Voz , Variaciones Dependientes del Observador

10.

Reproducibility assessment of rapid strains in cardiac MRI: Insights and recommendations for clinical application.

Halfmann, Moritz C; Hopman, Luuk H G A; Körperich, Hermann; Blaszczyk, Edyta; Gröschel, Jan; Schulz-Menger, Jeanette; Salatzki, Janek; André, Florian; Friedrich, Silke; Emrich, Tilman.

Eur J Radiol ; 174: 111386, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38447431

RESUMEN

PURPOSE: Studies have shown the incremental value of strain imaging in various cardiac diseases. However, reproducibility and generalizability has remained an issue of concern. To overcome this, simplified algorithms such as rapid atrioventricular strains have been proposed. This multicenter study aimed to assess the reproducibility of rapid strains in a real-world setting and identify potential predictors for higher interobserver variation. METHODS: A total of 4 sites retrospectively identified 80 patients and 80 healthy controls who had undergone cardiac magnetic resonance imaging (CMR) at their respective centers using locally available scanners with respective field strengths and imaging protocols. Strain and volumetric parameters were measured at each site and then independently re-evaluated by a blinded core lab. Intraclass correlation coefficients (ICC) and Bland-Altman plots were used to assess inter-observer agreement. In addition, backward multiple linear regression analysis was performed to identify predictors for higher inter-observer variation. RESULTS: There was excellent agreement between sites in feature-tracking and rapid strain values (ICC ≥ 0.96). Bland-Altman plots showed no significant bias. Bi-atrial feature-tracking and rapid strains showed equally excellent agreement (ICC ≥ 0.96) but broader limits of agreement (≤18.0 % vs. ≤3.5 %). Regression analysis showed that higher field strength and lower temporal resolution (>30 ms) independently predicted reduced interobserver agreement for bi-atrial strain parameters (ß = 0.38, p = 0.02 for field strength and ß = 0.34, p = 0.02 for temporal resolution). CONCLUSION: Simplified rapid left ventricular and bi-atrial strain parameters can be reliably applied in a real-world multicenter setting. Due to the results of the regression analysis, a minimum temporal resolution of 30 ms is recommended when assessing atrial deformation.

Asunto(s)

Imagen por Resonancia Cinemagnética , Imagen por Resonancia Magnética , Humanos , Estudios Retrospectivos , Reproducibilidad de los Resultados , Imagen por Resonancia Cinemagnética/métodos , Atrios Cardíacos , Variaciones Dependientes del Observador , Función Ventricular Izquierda

11.

Comparison of the Reliability of the House- Brackmann, Facial Nerve Grading System 2.0, and Sunnybrook Facial Grading System for the Evaluation of Patients with Peripheral Facial Paralysis.

Mengi, Erdem; Orhan Kara, Cüneyt; Necdet Ardiç, Fazil; Topuz, Bülent; Metin, Ulas; Alptürk, Ugur; Aydemir, Gökçe; Senol, Hande.

J Int Adv Otol ; 20(1): 14-18, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-38454283

RESUMEN

BACKGROUND: To compare the reliability of the House-Brackmann (HB), Facial Nerve Grading System 2.0 (FNGS 2.0), and Sunnybrook Facial Grading System (SB) which are widely used in the evaluation of peripheral facial paralysis (PFP) patients. METHODS: Thirty-five video-recorded adult PFP patients were included in the study. The evaluators comprised 6 physicians. Evaluations were conducted twice independently, utilizing video recordings. Simultaneously, the evaluators were asked to keep time during the evaluation. For the analysis of reliability, Fleiss' kappa coefficient was used for the HB, and the intraclass correlation coefficient (ICC) was used for the FNGS 2.0 and SB. RESULTS: The mean evaluation time of 1 patient was found to be 1.06 ± 0.24, 1.47 ± 0.23, and 2.32 ± 0.41 minutes for the HB, FNGS 2.0, and SB, respectively. For interrater reliability, Fleiss' kappa for the HB was 0.495 and 0.403; ICC for the FNGS 2.0 was 0.966 and 0.958; ICC for the SB was 0.960 and 0.967 for the first and second measurements, respectively. For intrarater reliability, Fleiss' kappa for the HB was 0.391, 0.446, 0.564, 0.502, 0.626, and 0.455; ICC for the FNGS 2.0 was 0.87, 0.982, 0.966, 0.929, 0.933, and 0.948; ICC for the SB was 0.935, 0.96, 0.895, 0.941, 0.96, and 0.94 for the 6 raters, respectively. CONCLUSION: In the present study, statistically high intra- and interrater correlations were found for the FNGS 2.0 and SB, while a moderate correlation was found for the HB. Although the HB seems to be more practical, it has been concluded that the FNGS 2.0 and SB are more reliable.

Asunto(s)

Parálisis Facial , Adulto , Humanos , Parálisis Facial/diagnóstico , Nervio Facial , Reproducibilidad de los Resultados , Variaciones Dependientes del Observador , Cara

12.

Augmented interpretation of HER2, ER, and PR in breast cancer by artificial intelligence analyzer: enhancing interobserver agreement through a reader study of 201 cases.

Jung, Minsun; Song, Seung Geun; Cho, Soo Ick; Shin, Sangwon; Lee, Taebum; Jung, Wonkyung; Lee, Hajin; Park, Jiyoung; Song, Sanghoon; Park, Gahee; Song, Heon; Park, Seonwook; Lee, Jinhee; Kang, Mingu; Park, Jongchan; Pereira, Sergio; Yoo, Donggeun; Chung, Keunhyung; Ali, Siraj M; Kim, So-Woon.

Breast Cancer Res ; 26(1): 31, 2024 02 23.

Artículo en Inglés | MEDLINE | ID: mdl-38395930

RESUMEN

BACKGROUND: Accurate classification of breast cancer molecular subtypes is crucial in determining treatment strategies and predicting clinical outcomes. This classification largely depends on the assessment of human epidermal growth factor receptor 2 (HER2), estrogen receptor (ER), and progesterone receptor (PR) status. However, variability in interpretation among pathologists pose challenges to the accuracy of this classification. This study evaluates the role of artificial intelligence (AI) in enhancing the consistency of these evaluations. METHODS: AI-powered HER2 and ER/PR analyzers, consisting of cell and tissue models, were developed using 1,259 HER2, 744 ER, and 466 PR-stained immunohistochemistry (IHC) whole-slide images of breast cancer. External validation cohort comprising HER2, ER, and PR IHCs of 201 breast cancer cases were analyzed with these AI-powered analyzers. Three board-certified pathologists independently assessed these cases without AI annotation. Then, cases with differing interpretations between pathologists and the AI analyzer were revisited with AI assistance, focusing on evaluating the influence of AI assistance on the concordance among pathologists during the revised evaluation compared to the initial assessment. RESULTS: Reevaluation was required in 61 (30.3%), 42 (20.9%), and 80 (39.8%) of HER2, in 15 (7.5%), 17 (8.5%), and 11 (5.5%) of ER, and in 26 (12.9%), 24 (11.9%), and 28 (13.9%) of PR evaluations by the pathologists, respectively. Compared to initial interpretations, the assistance of AI led to a notable increase in the agreement among three pathologists on the status of HER2 (from 49.3 to 74.1%, p < 0.001), ER (from 93.0 to 96.5%, p = 0.096), and PR (from 84.6 to 91.5%, p = 0.006). This improvement was especially evident in cases of HER2 2+ and 1+, where the concordance significantly increased from 46.2 to 68.4% and from 26.5 to 70.7%, respectively. Consequently, a refinement in the classification of breast cancer molecular subtypes (from 58.2 to 78.6%, p < 0.001) was achieved with AI assistance. CONCLUSIONS: This study underscores the significant role of AI analyzers in improving pathologists' concordance in the classification of breast cancer molecular subtypes.

Asunto(s)

Neoplasias de la Mama , Humanos , Femenino , Neoplasias de la Mama/diagnóstico , Neoplasias de la Mama/metabolismo , Receptores de Estrógenos/metabolismo , Biomarcadores de Tumor/metabolismo , Inteligencia Artificial , Variaciones Dependientes del Observador , Receptores de Progesterona/metabolismo , Receptor ErbB-2/metabolismo

13.

Review of sample size determination methods for the intraclass correlation coefficient in the one-way analysis of variance model.

Mondal, Dipro; Vanbelle, Sophie; Cassese, Alberto; Candel, Math Jjm.

Stat Methods Med Res ; 33(3): 532-553, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38320802

RESUMEN

Reliability of measurement instruments providing quantitative outcomes is usually assessed by an intraclass correlation coefficient. When participants are repeatedly measured by a single rater or device, or, are each rated by a different group of raters, the intraclass correlation coefficient is based on a one-way analysis of variance model. When planning a reliability study, it is essential to determine the number of participants and measurements per participant (i.e. number of raters or number of repeated measurements). Three different sample size determination approaches under the one-way analysis of variance model were identified in the literature, all based on a confidence interval for the intraclass correlation coefficient. Although eight different confidence interval methods can be identified, Wald confidence interval with Fisher's large sample variance approximation remains most commonly used despite its well-known poor statistical properties. Therefore, a first objective of this work is comparing the statistical properties of all identified confidence interval methods-including those overlooked in previous studies. A second objective is developing a general procedure to determine the sample size using all approaches since a closed-form formula is not always available. This procedure is implemented in an R Shiny app. Finally, we provide advice for choosing an appropriate sample size determination method when planning a reliability study.

Asunto(s)

Tamaño de la Muestra , Humanos , Reproducibilidad de los Resultados , Variaciones Dependientes del Observador , Análisis de Varianza

14.

Modification of Woven Endo-Bridge After Intracranial Aneurysm Treatment: A Methodology for Three-Dimensional Analysis of Shape and Relative Position Changes.

Muñoz, Romina; Dazeo, Nicolás; Estevez-Areco, Santiago; Janot, Kevin; Narata, Ana Paula; Rouchaud, Aymeric; Larrabide, Ignacio.

Ann Biomed Eng ; 52(5): 1403-1414, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38402315

RESUMEN

During follow-up of patients treated with WEB devices, shape changes have been observed. The quantitative three-dimensional measurement of the WEB shape modification (WSM) would offer useful information to be studied in association with the anatomical results and try to better understand mechanisms implicated in this modification phenomenon. We present a methodology to quantify the morphology and position of the WEB device in relation to the vascular anatomy. Three-dimensional rotational angiography (3DRA) images of seven aneurysms patients treated with WEBs were used, which also accompanied by a post-treatment 3DRA image and a follow-up 3DRA image. The device was manually segmented, obtaining the 3D models after treatment and at the follow-up. Volume, surface area, height, maximum diameter and WSM ratio of both surfaces were calculated. Position changes were evaluated measuring WEB axis and relative position between post-treatment and follow-up. Changes in WEB volume and surface area were observed with a mean modification of - 5.04 % ( ± 14.19 ) and - 1.68 % ( ± 8.29 ) , respectively. The positional variables also showed differences, mean change of device axis direction was 26.25 % ( ± 24.09 ) and mean change of distance l b was 5.87 % ( ± 10.59 ) . Inter-observer and intra-observer variability analyses did not show differences (ANOVA p > 0.05 ). This methodology allows quantifying the morphological and position changes suffered by the WEB device after treatment, offering new information to be studied in relation to the occurrence of WEB shape modification.

Asunto(s)

Embolización Terapéutica , Procedimientos Endovasculares , Aneurisma Intracraneal , Humanos , Aneurisma Intracraneal/diagnóstico por imagen , Aneurisma Intracraneal/terapia , Variaciones Dependientes del Observador , Resultado del Tratamiento , Estudios Retrospectivos , Angiografía Cerebral/métodos

15.

Regard to assessing agreement between two raters with kappa statistics.

Yu, Tianfei; Ren, Bingrui; Li, Ming.

Int J Cardiol ; 403: 131896, 2024 May 15.

Artículo en Inglés | MEDLINE | ID: mdl-38387729

Asunto(s)

Reproducibilidad de los Resultados , Humanos , Variaciones Dependientes del Observador

16.

Torsobarography: Intra-Observer Reliability Study of a Novel Posture Analysis Based on Pressure Distribution.

Stecher, Nico; Heinke, Andreas; Zurawski, Arkadiusz Lukasz; Harder, Maximilian Robert; Schumann, Paula; Jochim, Thurid; Malberg, Hagen.

Sensors (Basel) ; 24(3)2024 Jan 24.

Artículo en Inglés | MEDLINE | ID: mdl-38339484

RESUMEN

Postural deformities often manifest themselves in a sagittal imbalance and an asymmetric morphology of the torso. As a novel topographic method, torsobarography assesses the morphology of the back by analysing pressure distribution along the torso in a lying position. At torsobarography's core is a capacitive pressure sensor array. To evaluate its feasibility as a diagnostic tool, the reproducibility of the system and extracted anatomical associated parameters were evaluated on 40 subjects. Landmarks and reference distances were identified within the pressure images. The examined parameters describe the shape of the spine, various structures of the trunk symmetry, such as the scapulae, and the pelvic posture. The results showed that the localisation of the different structures performs with a good (ICC > 0.75) to excellent (ICC > 0.90) reliability. In particular, parameters for approximating the sagittal spine shape were reliably reproduced (ICC > 0.83). Lower reliability was observed for asymmetry parameters, which can be related to the low variability within the subject group. Nonetheless, the reliability levels of selected parameters are comparable to commercial systems. This study demonstrates the substantial potential of torsobarography at its current stage for reliable posture analysis and may pave the way as an early detection system for postural deformities.

Asunto(s)

Postura , Columna Vertebral , Humanos , Reproducibilidad de los Resultados , Variaciones Dependientes del Observador , Pelvis

17.

CD, or not CD, that is the question: a digital interobserver agreement study in coeliac disease.

Denholm, James; Schreiber, Benjamin A; Jaeckle, Florian; Wicks, Mike N; Benbow, Emyr W; Bracey, Tim S; Chan, James Y H; Farkas, Lorant; Fryer, Eve; Gopalakrishnan, Kishore; Hughes, Caroline A; Kirkwood, Kathryn J; Langman, Gerald; Mahler-Araujo, Betania; McMahon, Raymond F T; Myint, Khun La Win; Natu, Sonali; Robinson, Andrew; Sanduka, Ashraf; Sheppard, Katharine A; Tsang, Yee Wah; Arends, Mark J; Soilleux, Elizabeth J.

BMJ Open Gastroenterol ; 11(1)2024 Feb 01.

Artículo en Inglés | MEDLINE | ID: mdl-38302475

RESUMEN

OBJECTIVE: Coeliac disease (CD) diagnosis generally depends on histological examination of duodenal biopsies. We present the first study analysing the concordance in examination of duodenal biopsies using digitised whole-slide images (WSIs). We further investigate whether the inclusion of immunoglobulin A tissue transglutaminase (IgA tTG) and haemoglobin (Hb) data improves the interobserver agreement of diagnosis. DESIGN: We undertook a large study of the concordance in histological examination of duodenal biopsies using digitised WSIs in an entirely virtual reporting setting. Our study was organised in two phases: in phase 1, 13 pathologists independently classified 100 duodenal biopsies (40 normal; 40 CD; 20 indeterminate enteropathy) in the absence of any clinical or laboratory data. In phase 2, the same pathologists examined the (re-anonymised) WSIs with the inclusion of IgA tTG and Hb data. RESULTS: We found the mean probability of two observers agreeing in the absence of additional data to be 0.73 (±0.08) with a corresponding Cohen's kappa of 0.59 (±0.11). We further showed that the inclusion of additional data increased the concordance to 0.80 (±0.06) with a Cohen's kappa coefficient of 0.67 (±0.09). CONCLUSION: We showed that the addition of serological data significantly improves the quality of CD diagnosis. However, the limited interobserver agreement in CD diagnosis using digitised WSIs, even after the inclusion of IgA tTG and Hb data, indicates the importance of interpreting duodenal biopsy in the appropriate clinical context. It further highlights the unmet need for an objective means of reproducible duodenal biopsy diagnosis, such as the automated analysis of WSIs using artificial intelligence.

Asunto(s)

Enfermedad Celíaca , Humanos , Enfermedad Celíaca/diagnóstico , Transglutaminasas , Inteligencia Artificial , Variaciones Dependientes del Observador , Inmunoglobulina A

18.

Intra and inter-rater reproducibility of the Remote Static Posture Assessment (ARPE) protocol's Postural Checklist.

Pilling, Betiane Moreira; Candotti, Cláudia Tarragô; Silva, Marcelle Guimarães; Frantz, Marina Ziegler; Noll, Matias.

PLoS One ; 19(2): e0297506, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38335201

RESUMEN

With the enforcement of social distancing due to the pandemic, a need to conduct postural assessments through remote care arose. So, this study aimed to assess the intra- and inter-rater reproducibility of the Remote Static Posture Assessment (ARPE) protocol's Postural Checklist. The study involved 51 participants, with the postural assessment conducted by two researchers. For intra-rater reproducibility assessment, one rater administered the ARPE protocol twice, with an interval of 7-days between assessments (test-retest). A second independent rater assessed inter-rater reproducibility. Kappa statistics (k) and percentage agreement (%C) were used, with a significance level of 0.05. The intra-rater reproducibility analysis indicated high reliability, k values varied from 0.921 to 1.0, with %C ranging from 94% to 100% for all items on the ARPE protocol's Postural Checklist. Inter-rater reproducibility indicates reliability ranging from slight to good, k values exceeded 0.4 for the entire checklist, except for four items: waists in the frontal photograph (k = 0.353), scapulae in the rear photograph (k = 0.310), popliteal line of the knees in the rear photograph (k = 0.270), and foot posture in the rear photograph (k = 0.271). Nonetheless, %C surpassed 50% for all but the scapulae item (%C = 47%). The ARPE protocol's Postural Checklist is reproducible and can be administered by the same or different raters for static posture assessment. However, when used by distinct raters, the items waists (front of the frontal plane), scapulae, popliteal line of the knees, and feet (rear of the frontal plane) should not be considered.

Asunto(s)

Lista de Verificación , Postura , Humanos , Reproducibilidad de los Resultados , Variaciones Dependientes del Observador

19.

The role of ¹⁸F-FDG PET in minimizing variability in gross tumor volume delineation of soft tissue sarcomas.

Najem, Elie; Marin, Thibault; Zhuo, Yue; Lahoud, Rita Maria; Tian, Fei; Beddok, Arnaud; Rozenblum, Laura; Xing, Fangxu; Moteabbed, Maryam; Lim, Ruth; Liu, Xiaofeng; Woo, Jonghye; Lostetter, Stephen John; Lamane, Abdallah; Chen, Yen-Lin Evelyn; Ma, Chao; El Fakhri, Georges.

Radiother Oncol ; 194: 110186, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38412906

RESUMEN

BACKGROUND: Accurate gross tumor volume (GTV) delineation is a critical step in radiation therapy treatment planning. However, it is reader dependent and thus susceptible to intra- and inter-reader variability. GTV delineation of soft tissue sarcoma (STS) often relies on CT and MR images. PURPOSE: This study investigates the potential role of 18F-FDG PET in reducing intra- and inter-reader variability thereby improving reproducibility of GTV delineation in STS, without incurring additional costs or radiation exposure. MATERIALS AND METHODS: Three readers performed independent GTV delineation of 61 patients with STS using first CT and MR followed by CT, MR, and 18F-FDG PET images. Each reader performed a total of six delineation trials, three trials per imaging modality group. Dice Similarity Coefficient (DSC) score and Hausdorff distance (HD) were used to assess both intra- and inter-reader variability using generated simultaneous truth and performance level estimation (STAPLE) GTVs as ground truth. Statistical analysis was performed using a Wilcoxon signed-ranked test. RESULTS: There was a statistically significant decrease in both intra- and inter-reader variability in GTV delineation using CT, MR 18F-FDG PET images vs. CT and MR images. This was translated by an increase in the DSC score and a decrease in the HD for GTVs drawn from CT, MR and 18F-FDG PET images vs. GTVs drawn from CT and MR for all readers and across all three trials. CONCLUSION: Incorporation of 18F-FDG PET into CT and MR images decreased intra- and inter-reader variability and subsequently increased reproducibility of GTV delineation in STS.

Asunto(s)

Fluorodesoxiglucosa F18 , Imagen por Resonancia Magnética , Tomografía de Emisión de Positrones , Sarcoma , Carga Tumoral , Humanos , Sarcoma/diagnóstico por imagen , Sarcoma/patología , Sarcoma/radioterapia , Tomografía de Emisión de Positrones/métodos , Femenino , Masculino , Imagen por Resonancia Magnética/métodos , Persona de Mediana Edad , Radiofármacos , Variaciones Dependientes del Observador , Adulto , Anciano , Reproducibilidad de los Resultados , Tomografía Computarizada por Rayos X/métodos , Neoplasias de los Tejidos Blandos/diagnóstico por imagen , Neoplasias de los Tejidos Blandos/patología , Neoplasias de los Tejidos Blandos/radioterapia , Planificación de la Radioterapia Asistida por Computador/métodos

20.

Intraobserver and interobserver agreement of 8 segmental reflexes in healthy dogs.

Chiang, Bryan; Garcia, Gabriel; Leverone, Francesco; Hernandez, Jorge A; Carrera-Justiz, Sheila.

J Vet Intern Med ; 38(2): 1101-1110, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38339888

RESUMEN

BACKGROUND: No available literature supports the claim that the patellar and withdrawal (flexor) reflexes are the only reliable segmental reflexes in dogs. OBJECTIVE: Measure intra- and interobserver agreement of 8 segmental reflexes in dogs without clinical evidence of orthopedic or neurologic disease. ANIMALS: One-hundred and one client- or staff-owned dogs between 1 and 10 years of age with no clinical evidence of orthopedic disease, myelopathy, or neuromuscular disease. METHODS: Descriptive study. The intraobserver proportion of agreement (%) of responses to selected segmental reflexes in right versus left limbs by 3 observers was calculated and reported. The interobserver agreement of 2 observers of responses to selected reflexes was estimated by calculating proportions of agreement, kappa values, and 95% confidence intervals. A segmental reflex with an acceptable agreement was defined as that with a proportion of agreement ≥90% and a Kappa value ≥0.61 in both limbs. RESULTS: The intraobserver proportion of agreement for all 3 observers was high (≥95%) for the extensor carpi radialis, withdrawal, patellar, and cranial tibial reflexes. Between observers 1 and 3 and observers 2 and 3, the interobserver proportion of agreement was high (≥ 92%) for the extensor carpi radialis (κ 0.66, not determined [ND]), withdrawal (both limbs, κ ND), patellar (κ ND), and cranial tibial reflexes (κ ND). CONCLUSIONS AND CLINICAL IMPORTANCE: The extensor carpi radialis, withdrawal, patellar, and cranial tibial reflexes had a higher proportion of agreement and kappa values between 2 observers.

Asunto(s)

Enfermedades de los Perros , Enfermedades de la Médula Espinal , Humanos , Perros , Animales , Variaciones Dependientes del Observador , Reflejo , Extremidades , Enfermedades de la Médula Espinal/veterinaria , Reproducibilidad de los Resultados

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA